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ABSTRACT 

RNAi technology has been emerging as a potential 
modality to inhibit viruses during past decade. In 
literature a few siRNA databases have been 
reported that focus on targeting human and mam- 
malian genes but experimentally validated viral 
siRNA databases are lacking. We have developed 
VIRsiRNAdb, a manually curated database having 
comprehensive details of 1358 siRNA/shRNA target- 
ing viral genome regions. Further, wherever avail- 
able, information regarding alternative efficacies of 
above 300 siRNAs derived from different assays has 
also been incorporated. Important fields included in 
the database are siRNA sequence, virus subtype, 
target genome region, cell type, target object, ex- 
perimental assay, efficacy, off-target and siRNA 
matching with reference viral sequences. Database 
also provides the users with facilities of advance 
search, browsing, data submission, linking to 
external databases and useful siRNA analysis tools 
especially siTarAlign which align the siRNA with ref- 
erence viral genomes or user defined sequences. 
VIRsiRNAdb contains extensive details of siRNA/ 
shRNA targeting 42 important human viruses 
including influenza virus, hepatitis B virus, HPV 
and SARS Corona virus. VIRsiRNAdb would prove 
useful for researchers in picking up the best viral 
siRNA for antiviral therapeutics development and 
also for developing better viral siRNA design tools. 
The database is freely available at http://crdd.osdd 
.net/servers/virsirnadb. 



INTRODUCTION 

Viral diseases remain one of the public health problems 
due to emerging and reemerging nature of viruses such as 



influenza, hepatitis, Human Immunodeficiency Virus 
(HIV), Human Papillomavirus (HPV) & Severe Acute 
Respiratory Syndrome (SARS) etc. (1). Combating 
majority of these viruses is compromised due to lack of 
effective vaccines and antiviral drugs (2). Besides, devel- 
opment of new vaccines and antiviral drugs, there are con- 
tinuous efforts to search for alternative therapeutic 
interventions. Lately, RNA interference has emerged as 
a potential approach in the battle against pathogenic 
viruses (3,4) and other human diseases (5,6). 

RNAi was first reported by Fire et al. (7) when authors 
showed a potent gene silencing effect after injecting double 
stranded RNA into C. elegans. In RNA silencing 
pathway, long dsRNA is processed by RNase III family 
member, dicer, to a 19-21 nucleotide long double stranded 
siRNA, with 2-nucleotide unphosphorylated 3' overhangs 
(8). The double stranded siRNA is composed of a guide 
(antisense) strand and a passenger (sense) strand. 
Unwinding of the siRNA duplex is catalyzed by 
argonaute. After the unwinding step, the guide strand is 
incorporated into the RNA Induced Silencing Complex 
(RISC), while the passenger strand is released. Using the 
antisense strand RISC targets, the complementary mRNA 
resulting in the cleavage of the latter (9). 

Using RNA silencing mechanism, researchers have 
reported considerable decrease in the expression of 
targeted viral genes (10,11). For example, siRNAs 
directed against the influenza virus nucleocapsid (NP) 
and RNA transcriptase (PA) genes inhibited its transcrip- 
tion and replication (12). Similarly, siRNAs against the 
hepatitis B virus polyadenylation (PA), precore (PreC) 
and surface (S) regions inhibited the viral replication 
(13). In another study, siRNAs synthesized to target the 
E, M and N genes of SARS-CoV effectively down 
regulated the target genes expression by over 80% in a 
dose-dependent manner (14). Inhibition of virus replica- 
tion for several human viruses using RNAi strategy has 
been reviewed (3,15,16). 

RNAi approach offers several advantages for antiviral 
therapeutics development. It has ability to target all types 
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of viral genomes [ssDNA, dsDNA, RNA(+), RNA(-) 
and dsRNA] which makes this versatile mechanism to be 
harnessed as broad-spectrum antiviral therapy (4). 
Further, RNAi targets a short stretch of viral nucleic 
acids instead of a functional domain of a viral protein, 
therefore, even a small viral genome offers many potential 
targets (11). Even more, multiple antiviral siRNAs can be 
expressed simultaneously or pooled in a way similar to 
current drug combination anti-viral therapy of infected 
individuals to sustain prolonged effect (17,18). 

In the past decade, a number of RNAi therapeutic 
programs with focus on cancer, metabolic diseases, re- 
spiratory disorders, retinal degeneration, dominantly 
inherited brain and skin diseases and infectious dis- 
eases have entered the clinical practice (6,19,20). 
Simultaneously, several RNAi based antiviral therapeutic 
projects have also reached at clinical trial stages (21), for 
example, RSV (Phase II) (22), HBV (Phase I) (23), HCV 
(Phase II) (24) and HIV (Phase I) (25). Ongoing clinical 
trials further emphasize the need for development of the 
viral RNAi resources. 

There is no dedicated viral siRNA database, except 
HIVsirDB, an HIV specific siRNA database (26). 
However, there are a few other siRNA databases 
reported in literature like HuSiDa (27) and siRNAdb 
(28) which provide sequences of published functional 
siRNA targeting human genes while siRecords (29) 
focused on siRNA data of mammalian RNAi experiments 
and DSTHO (30) on human oncogenes. VIRsiRNAdb is 
an attempt to provide comprehensive details of the experi- 
mentally validated viral siRNA targeting the diverse 
genome regions of as many as 42 important human 
viruses at one platform to help researchers working in 
the field of siRNA based antiviral therapeutic 
development. 

DATABASE CONTENT 

Data acquisition 

Exhaustive literature search was carried out to extract the 
relevant articles from PubMed. This was accomplished by 
searching queries having combination of two keywords: (i) 
terms most commonly used for gene silencing viz: RNA 
interference, RNAi, silencing, siRNA(s), shRNA(s), small 
interfering RNA(s), short interfering RNA(s), small 
hairpin RNA(s) and short hairpin RNA(s) and (ii) virus 
names including their common names, aliases & abbrevi- 
ations (like Severe acute respiratory syndrome, SARS, 
Corona Virus, SARS-CoV). Full text search using the 
above two keywords combinations was performed for 
each of the human viruses individually. The search 
results are given in Supplementary Table SI. Around 
4000 abstracts were screened so as to select the articles 
likely to contain relevant viral siRNA information. 
Reviews, general methodological and non-English 
articles were not considered. After initial screening, 
around 1000 remaining potential articles were examined 
in detail to retrieve the viral siRNA information. Articles 
of siRNAs targeting the host genome regions were 
excluded. Further, articles that did not have individual 



siRNA sequence or its experimental efficacies were also 
not included. After this extensive filtering, 221 research 
articles were shortlisted to collect the siRNA data. In 
our database, complete siRNA data of almost all human 
viruses reported in the literature have been included. 

Database architecture 

The database provides comprehensive information of 
experimentally validated viral siRNAs which includes: 
(i) siRNA sequence, (ii) family of virus, (iii) virus 
subtype, (iv) target gene, (v) siRNA location, 
(vi) GenBank accession, (vii) design algorithm, (viii) pro- 
duction method, (ix) siRNA concentration, (x) cell type, 
(xi) transfection method, (xii) incubation time, (xiii) 
PubMed ID, (xiv) object used (i.e. mRNA protein, virus 
load etc), (xv) efficacy, (xvi) efficacy assay (e.g. Western 
blot, PCR, plaque number, ELISA) and (xvii) references. 
Further, wherever available, extended information regard- 
ing alternative efficacy assays has also been provided. 
Architecture of the database is depicted in the Figure 1. 
Structure of each siRNA predicted by Mfold (31) was also 
displayed in the data. In addition, we have also provided 
information of viral siRNA off-targets in human and the 
siRNA sequence matching with the reference viral genome 
sequences. 

Database statistics 

VIRsiRNAdb database provides information of 1358 ex- 
perimentally validated siRNAs pertaining to 42 important 
human viruses belonging to 19 different virus families and 
targeting as many as 150 different viral genome regions. 
For HBV, HCV, SARS and Coxsackievirus many genome 
regions were being targeted by siRNAs as given in 
Table 1 . The database entries contain siRNA experiments 
based on 71 different cell lines but Huh-7, 293T, MDCK, 
HepG2.2.15 and HeLa cell lines were mostly used 
(Figure 2a). In the database, 45% of the total siRNAs 
were highly effective with >70% inhibition efficacy and 
9% siRNA have >90% efficacy. siRNAs (23%) have 
moderate efficacy of 50-70% whereas 32% of siRNAs 
were less effective with efficacy rating <50% (Figure 2b). 

One of the major hindrances in RNAi based therapeut- 
ics is the lack of siRNA specificity. Besides, directly affect- 
ing the expression of the desired genes, a siRNA may 
affect regulation of unintended transcripts which possess 
complementarity to the siRNA sequence. siRNA 
off-target effect was initially reported in 2003 (32) and 
later Amanda Birmingham et al. (33) reported that 
off-targeting is associated with the presence of one or 
more perfect 3' untranslated region (UTR) matches with 
the hexamer or heptamer seed region (positions 2-7 or 
2-8) of the antisense strand of the siRNA. Seed based 
siRNA off-target was experimentally demonstrated by 
others also (34,35). The impact of non-specific siRNA 
off-target effect in therapeutic application was further 
reviewed (36). 

We have predicted the off-targets in human for all the 
siRNAs present in our database, using three algorithms: 
(i) BLAST (37), (ii) Seed Locator (33) and (iii) 
SpecificityServer (38). Result outputs of each algorithm 



D232 Nucleic Acids Research, 2012, Vol. 40, Database issue 




Figure 1. VIRsiRNAdb database architecture. 



are given against respective siRNA record as link under 
off-target column. BLAST algorithm was commonly used 
to detect possible off-target effects of a siRNA by 
searching it against the human Unigene or transcriptome 
database (28). We have also used BLAST (37) with -e 
1000; —q —4; —r 5 parameters and found that around 
13% of siRNA having off-targets in the human genome. 
Seed Locator output include total genes with at least one 
seed match and multiple seed matches in the 3'-UTR. 
Finally, results of SpecificityServer which is designed to 
identify potential non-specific matches to siRNA showed 
that 113 siRNAs are not specific for both siRNA strands 
while 101 siRNA have off-targets for the sense strand and 
remaining does not have any off-target. 

As we know that viruses exhibit greater genetic variabil- 
ity, therefore it is important to know that in how many 
viral genome sequences, siRNA sequence is matching. 
This analysis is helpful for users in selecting such siRNA 
which is having high matching with maximum reference 
viral strains. Significance of selection of conserved regions 
targeted by siRNA in HIV-1 has been discussed by Naito 
et al. (39,40). We have checked the siRNA sequence 
matching with the reference viral genome sequences avail- 
able at NCBI. For this purpose, we have used ALIGN0 
algorithm (41), which computes the alignment of two 
DNA sequences without penalizing for end-gaps. Pie 
chart result displayed the number of nucleotide differences 
or mismatches (0, 1, 2, 3, >3) between of each siRNA and 
respective viral reference genome sequences in the align- 
ment. Cumulative results of all the siRNA showed that 
2% of siRNAs were fully (100%) matching with respective 
viral genome sequences and 16% matched with 90-99% 
viral genomes while 61% were having <50% matching as 
shown in Figure 2c. 



There are reports of escape mutants generated by the 
virus in the siRNA target site to overcome the effect of 
RNAi. These escape mutations in the target sequence de- 
creases the potency of siRNA gene silencing (42). Wilson 
(43) observed maximum escape mutations at 12th and 
18th residues for HCV NS5B while Konishi (44) 
reported appreciable mutation at the 15th residue for 
HCV NS5A gene. In another study, Jun (45) recorded 
changes in Coxsackie virus at positions 10 and 13. We 
have collected such 57 siRNA escape sequences having 
52 substitutions; 2 deletions; 1 insertion and 2 substitu- 
tion/deletion mutations. Position of these escape substitu- 
tions mutations among 57 escape sequences are shown in 
Figure 2d. 

Tools 

Viral siRNA database allows the users to take advantage 
of useful tools like siTarAlign, siRNAmap and 
siRNAblast. siTarAlign aligns the siRNA sequence with 
the respective virus/family reference genomes sequences 
using either BLAST (37) or Smith-Waterman algorithm 
from EMBOSS suite (46) The output shown below 
displays a list of flaviviruses and influenza A viruses 
targeted by respective siRNA (Figure 3). Viral/family ref- 
erence genomes were taken from the NCBI viral genome 
resources as summarized in the Supplementary Table S2. 
In siTarAlign, user defined viral genome sequences can 
also be uploaded to align the siRNA sequence with user 
provided sequences also. 

The 'siRNAmap' is a simple tool to display the perfectly 
matching siRNA available in our database to the user 
provided viral sequence. So, it helps the user to know 
that against the user provided viral sequence, how many 
siRNAs are available in VIRsiRNAdb. Additionally, the 
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Figure 2. Database statistics (a) Cell line used (b) siRNA efficacy (c) siRNA sequence matching with reference viral genomes (d) Positions of the 
escape mutations. 



siRNAblast allows alignment of a user provided siRNA 
sequence against all the siRNA sequences available in our 
database. This helps the user to confirm whether a given 
siRNA sequence or similar one has already been reported 
or not. 

Data retrieval 

It is possible to perform a quick search based on various 
database fields i.e. Virus name, siRNA sequence, target 
region, cell line and Pubmed ID. We have included a 
separate search option to retrieve siRNA with efficacy; 
greater than, equal to and lower than for a given value. 
Database also has qualitative efficacy of some siRNAs 
(where numerical values were not available) in three 
categories viz. 'High' (>70%), 'Medium' (50-70%) and 
'Low' (<50%). The efficacy search will also fetch 
siRNAs with qualitative efficacies. 

In the search output we have implemented the sorting 
and filtering functions. By clicking the heading of the 
given field, user can sort the displayed data. 
Simultaneously, by entering the desired keyword in the 
designated field, user can filter the siRNA data. Multiple 
filtering can be accomplished by entering desired keyword 
in different fields one after another. 'Advanced Search 
page' allows for more flexible queries using logical oper- 
ators (AND, OR). These options enable the user to readily 
find appropriate siRNA data. External links pointing to 



the GenBank accession of the siRNA target sequence, 
Pubmed ID and International Committee on Taxonomy 
of Viruses (ICTV) are given for each siRNA record. 

Data submission 

Authors generating experimental viral siRNA data are 
encouraged to submit the data directly into viral siRNA 
database. For this purpose, a web form for data submis- 
sion is provided. Submitted information will be included 
in the database update after ascertaining its authenticity. 

Implementation 

VIRsiRNAdb database is implemented on Red Hat Linux 
with MySQL (5.0.51b) and Apache (2.2.17) in back-end 
and front-end of web interface is implemented with PHP 
(5.2.14). 

Future developments 

As increasing number of articles are being published in the 
area of viral RNAi, therefore, in future our main priority 
would be to update the existing viral siRNA data as well 
as to include siRNA information for new viruses once 
appropriate data is available. We would also include 
virus specific siRNA design tool to further help the 
researchers. 
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(a) 

Query Example l acagcatattgacacctgg 19 

NC 001437.1 Japanese encephalitis virus, genome 10862 10880 100 

INC 000943.1 Murray Valley encephalitis virus, complete genome 10900 10918 100 

NC 001563.2 West Nile virus (lineage II strain 956), complete genome 10854 10872 100 

NC 008718.1 Entebbe bat virus, complete genome 10473 10491 100 

NC 007580.2 St. Louis encephalitis virus, complete genome 10832 10850 100 

NC 009942.1 West Nile virus (lineage I strain NY99), complete genome 10921 10939 100 

NC 012533.1 Kedougou virus, complete genome 10613 10631 100 

NC 012534.1 Bagaza virus, complete genome 10833 10851 100 

NC 009026.2 Bussuquara virus, complete genome 10708 10726 100 

NC 006551.1 Usutu virus, complete genome 10951 10969 100 

NC 009029.2 Kokobera virus, complete genome 10770 - - 10788 100 

NC 009028.2 I Iheus virus, complete genome 10647 CJ 10665 94 

NC 010412.1 Simian enterovirus SV19, complete genome 3621 ...C...Cf 3604 84 

NC 001477. '. Dengue vi-os tyoe 1, comolete gciome 10629 Cf . tg . . 10647 84 

Query virsil480 l gccgagatcgcacagagactt 21 

NC 007377.1 Influenza A virus (A/Korea/426/68(H2N2)) segment 7, complete sequence 89 109 100 

V01099.1 Influenza A virus (A/PR/8/1934(HlNl)) 0RF1 and 0RF2, genomic RNA 89 109 100 

NC 004907.1 Influenza A virus (A/Hong Kong/1073/99|H9IM2)) segment 7, complete sea 96 q 116 95 

NC 0073G7.1 Influenza A virus (A/New Vork/392/2004(H3N2ii segment 7, complete seq 89 g 109 95 

NC 007363.1 Influenza A virus (A/Goose/Guangdong/l/96(H5Nl)) strain A/Goose/Guang 89 g 109 95 

NC 007376.1 Influenza A virus (A/Korea/426/68(H2N2U segment 3, complete sequence 269 279 52 

V01106.1 Influenza A virus (A/PR/8/1938(HlNl)} ORF1, genomic RNA 269 279 52 



RED =100 % Complementary sequences ; . = Identical residues ; Blue alphabets = Mismatches ; _= Gaps 



Figure 3. siTarAlign output screenshot showing the alignment of siRNA sequence with (a) family (b) virus reference genome sequences. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables 1 and 2. 
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